Journals
Journal:
NAR GENOMICS AND BIOINFORMATICS
ISSN:
2631-9268
Year:
2022
Vol.:
4
N°:
3
pp.
lqac067
Alternative splicing (AS) plays a key role in cancer: all its hallmarks have been associated with different mechanisms of abnormal AS. The improvement of the human transcriptome annotation and the availability of fast and accurate software to estimate isoform concentrations has boosted the analysis of transcriptome profiling from RNA-seq. The statistical analysis of AS is a challenging problem not yet fully solved. We have included in EventPointer (EP), a Bioconductor package, a novel statistical method that can use the bootstrap of the pseudoaligners. We compared it with other state-of-the-art algorithms to analyze AS. Its performance is outstanding for shallow sequencing conditions. The statistical framework is very flexible since it is based on design and contrast matrices. EP now includes a convenient tool to find the primers to validate the discoveries using PCR. We also added a statistical module to study alteration in protein domain related to AS. Applying it to 9514 patients from TCGA and TARGET in 19 different tumor types resulted in two conclusions: i) aberrant alternative splicing alters the relative presence of Protein domains and, ii) the number of enriched domains is strongly correlated with the age of the patients.
Journal:
BIOINFORMATICS
ISSN:
1367-4803
Year:
2022
Vol.:
38
N°:
6
pp.
1491 - 1496
Motivation: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. Results: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability.
Journal:
BIOINFORMATICS
ISSN:
1367-4803
Year:
2022
Vol.:
38
N°:
3
pp.
844 - 845
Motivation: Discover is an algorithm developed to identify mutually exclusive genomic events. Its main contribution is a statistical analysis based on the Poisson-Binomial (PB) distribution to take into account the mutation rate of genes and samples. Discover is very effective for identifying mutually exclusive mutations at the expense of speed in large datasets: the PB is computationally costly to estimate, and checking all the potential mutually exclusive alterations requires millions of tests. Results: We have implemented a new version of the package called Rediscover that implements exact and approximate computations of the PB. Rediscover exact implementation is slightly faster than Discover for large and medium-sized datasets. The approximation is 100-1000 times faster for them making it possible to get results in less than a minute with a standard desktop. The memory footprint is also smaller in Rediscover. The new package is available at CRAN and provides some functions to integrate its usage with other R packages such as maftools and TCGAbiolinks. Availability and implementation: Rediscover is available at CRAN (https://cran.r-project.org/web/packages/ Rediscover/index.html).
Journal:
SCIENTIFIC REPORTS
ISSN:
2045-2322
Year:
2020
Vol.:
10
N°:
1
The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specific and do not distinguish the functions of the different proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform + GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) five times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specific functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specific functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientific community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav.es/app/isogo). Initial data, website link, isoform-specific GO function predictions and R code is available at https://gitlab.com/icassol/isogo.
Journal:
BMC GENOMICS
ISSN:
1471-2164
Year:
2019
Vol.:
20
N°:
Art. 521
BackgroundSplicing is a genetic process that has important implications in several diseases including cancer. Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have a priori hypotheses, as a single CLIP experiment targets a single protein.ResultsIn this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific splicing factors are knocked-down.ConclusionsThe methodology presented in this study allows the prediction of active splicing factors in either cancer or any other condition by only using the information of transcript expression. This approach opens a wide range of possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and databases is available at https://gitlab.com/fcarazo.m/sfprediction.
National and regional funding
Title:
Predicción de vulnerabilidades en cáncer utilizando algoritmos nuevos e interpretables basados en redes y datos moleculares (INTERPRET)
Project reference:
PID2022-143298OB-I00
Principal Investigator:
Ángel Rubio Díaz-Cordovés, Francisco Javier Planes Pedreño
Funding organisation:
AGENCIA ESTATAL DE INVESTIGACION
Programme/Call:
2022 AEI Proyectos de Generación del Conocimiento
Starting date:
01/09/2023
Ending date:
31/08/2026
Amount awarded:
112.500,00€
Other Funds:
-
Title:
Nueva aproximación computacional para predecir letalidad sintética en cáncer (SYNLETHAL)
Project reference:
PID2019-110344RB-I00
Principal Investigator:
Ángel Rubio Díaz-Cordovés, Francisco Javier Planes Pedreño
Funding organisation:
MINISTERIO DE CIENCIA, INNOVACIÓN Y UNIVERSIDADES
Programme/Call:
2019 AEI PROYECTOS I+D+i (incluye Generación del conocimiento y Retos investigación)
Starting date:
01/06/2020
Ending date:
31/05/2023
Amount awarded:
90.750,00€
Other Funds:
-
Title:
Desarrollo de métodos diagnósticos y nuevas terapias en la era de la medicina de precisión en cáncer (bG22)
Project reference:
KK-2022-00045
Principal Investigator:
Francisco Javier Planes Pedreño
Funding organisation:
EUSKO JAURLARITZA - GOBIERNO VASCO
Programme/Call:
Programa ELKARTEK 2022
K1: Proyecto de Investigación Fundamental Colaborativa - Investigación Fundamental
Starting date:
01/03/2022
Ending date:
31/03/2024
Amount awarded:
121.787,88€
Other Funds:
-
Title:
Influencia en la expresión de las variantes genéticas (eQTL) y su potencialidad como biomarcador en esclerosis múltiple.
GV Salud 22_eQTL_ARubio
Project reference:
2022333040
Principal Investigator:
Ángel Rubio Díaz-Cordovés
Funding organisation:
EUSKO JAURLARITZA - GOBIERNO VASCO
Programme/Call:
Departamento de Salud. Ayudas a proyectos de investigación y desarrollo en salud 2022
Starting date:
01/01/2023
Ending date:
31/08/2023
Amount awarded:
27.459,58€
Other Funds:
-
Title:
Influencia en la expresión de las variantes genéticas (eQTL) y su potencialidad como biomarcador en esclerosis múltiple
Project reference:
2022333040
Principal Investigator:
Ángel Rubio Díaz-Cordovés
Funding organisation:
EUSKO JAURLARITZA - GOBIERNO VASCO
Programme/Call:
Departamento de Salud. Ayudas a proyectos de investigación y desarrollo en salud 2022
Starting date:
01/01/2022
Ending date:
31/08/2022
Amount awarded:
27.459,58€
Other Funds:
-